I Released TMLM-Haiku-1.3 And It Is Still Dumb
I released TMLM-Haiku-1.3 today. It is on Hugging Face. It is open weights. It is still completely devoid of intelligence. I trained it with Muon. I spent electricity. I generated heat. The model still thinks Paris is a person.
You might ask why I keep doing this. You might ask why I versioned it to 1.3 instead of 2.0. You might ask why I used Muon instead of AdamW. I do not have good answers. I have weights.
Progress is not always vertical. Sometimes it is horizontal. Sometimes it is circular. Sometimes it is just releasing the same dumb model with a different optimizer.
The Muon Experiment
AdamW is standard. SGD is classic. Muon is new. It claims better convergence for transformers. It claims to handle large batch sizes better. It claims to be worth the hype. I wanted to test the claim.
I switched the optimizer. I kept the data. I kept the architecture. I kept the low expectations. The training loss went down faster. The validation loss still plateaued. The model still outputs fish facts when asked for math.
Haiku-1.0: AdamW, 261 hours, 600W
Haiku-1.3: Muon, 198 hours, 800W
# Faster training. More power. Same stupidity.
The training finished in 198 hours instead of 261. That is a twenty-four percent speedup. I attribute this to Muon. I also attribute it to the 800W overclocked VBIOS I flashed last week. The GPU was screaming. The loss was descending. The result is unchanged.
Intelligence Report
I tested it. I asked simple questions. It gave complex wrong answers. It is confident. It is fluent. It is incorrect. This is the hallmark of a modern language model. I have successfully replicated industry standards in my bedroom.
Why Version 1.3
Version 2.0 implies improvement. Version 2.0 implies a new architecture. Version 2.0 implies I solved something. I did not solve anything. I changed the optimizer. I tweaked the learning rate schedule. I added more dropout.
Version 1.3 is honest. It says this is a minor update. It says do not expect miracles. It says the fish facts are still included at no extra cost. I value honesty in versioning.
The Hardware Impact
This model was trained on the Astral ROG RTX 5090 OC LC. The one with the Matrix VBIOS. The one running at 800W. The one that heats my room like a furnace. The Muon optimizer allowed larger batch sizes. Larger batch sizes meant more VRAM usage. More VRAM usage meant the 800W power limit was fully utilized.
My electricity bill hates me. My GPU loves me. The model does not care. It exists. It consumes tokens. It produces nonsense. It is alive in the way a spreadsheet is alive.
I spent eight hundred watts to make a model that cannot count. This is art. This is science. This is a waste of money. All three can be true.
What Changed
Technically? The loss curve is smoother. The gradients are more stable. The training did not NaN this time. I consider this a major victory. After the NaN disaster of last week, a completed training run feels like a miracle.
Functionally? Nothing. It still does not know the capital of France. It still thinks two plus two is a philosophical question. It still apologizes profusely when it is wrong. Then it gives another wrong answer.
Download It If You Want
Free. Open weights. Trained with Muon. Still dumb. Run it locally. Save the API costs. Get fish answers directly on your hardware.
Future Plans
Sonnet is still training. It is at 12 percent now. The overclocked GPU is helping. The Muon optimizer is being tested on Sonnet too. If Haiku-1.3 is any indication, Sonnet will be faster to train and equally disappointing.
Opus is still a dream. A 600M parameter dream. A dream that requires me to not burn my house down. I am working on it. Slowly. Painfully. With too much power.
Final Thoughts
I released a model. It is not smart. It is faster to train. It uses more electricity. I am proud of it. This is what hobbyists do. We build things. We release them. We accept their flaws. We love them anyway.
If you download it, please be kind. It is trying its best. Its best is not good. But it is trying. Just like me.